In the US, NFL football is the primary focus. Baseball has also been historically popular.
Fantasy sports are also catching on in other countries. The fantasy sports market is growing in Europe, where some US-based companies chose to expand when stateside legislation threatened their domestic growth.
Fantasy sports are often categorized as a "game of skill", and as such tend to be permitted in areas that have gambling laws forbidding open betting on sports.
Daily Fantasy Sports have seen the largest jump, with players spending \$5 to play in 2012, but up to \$257 per year by 2015, according to the Wikipedia article about fantasy sports.
Heavily modified: 1) Target website structure changed; 2) Original script did not report status/progress; 3) Made it at least somewhat restartable.
Player stats are updated every two weeks on average
all_attributes = ['Body Type', 'Preferred Foot', 'International Reputation',
'Weak Foot', 'Skill Moves', 'Crossing', 'Finishing',
'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',
'Curve', 'FKAccuracy', 'LongPassing', 'BallControl',
'Acceleration', 'SprintSpeed', 'Agility', 'Reactions',
'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',
'LongShots', 'Aggression', 'Interceptions', 'Positioning',
'Vision', 'Penalties', 'Composure', 'DefensiveAwareness',
'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving',
'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes',
'Attack Work Rate', 'Defence Work Rate']
pos_radial_plot(position_attribute_mean_df,position_attribute_max_df)
Many models oblige us to choose the number of clusters. I tried using the elbow method to find an optimal number of clusters for this data. Here's how that looks:
wcss = []
for i in range(1,8):
kmeans = KMeans(i)
kmeans.fit(df_for_clustering)
wcss.append(kmeans.inertia_)
plt.figure(figsize=(10,5))
plt.plot(range(1,8),wcss)
plt.xticks(np.arange(1,8,step=1))
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
Text(0, 0.5, 'WCSS')
Choosing a consistent number of clusters across models makes their output easier to compare. The elbow method suggests just two or three clusters.
Two clusters provides no insight--all of the models invariably choose goalkeepers vs. non-goalkeepers as the two clusters.
Three clusters are another possibility--a third cluster splits the non-goalkeeper positions into what is roughly the attacking positions and the defensive positions. (Midfielders are split between these two clusters.)
I also tried four clusters--a fourth cluster was not stable under either the K-Means or Gaussian Mixture models, but centered around the midfield positions.
model_comparison(km,df,df_for_clustering,km_pred)
model_comparison(hdb,df,df_for_clustering,hdb_pred)
Since the other two are hard-clustering models I thought I would give this a shot, and see how its output compares to the K-Means and HDBSCAN models.
model_comparison(gm,df,df_for_clustering,gm_pred)
Goalkeepers ended up in their own cluster. There are some very specific attributes for goalkeepers, which I suspect is why it was so easy for the models to separate them from the rest of the pack.
For non-goalkeepers, most attributes are shared across positions, though some do favor a particular position. Those shared characteristics mean that the boundaries between clusters are less obvious.
cluster_pie(df,'Position')
What the model did is cluster players with similarly strong attributes. The desired attributes shift across positions on the pitch, meaning the cluster with the strongest forwards is not the same as the cluster with the strongest defensive players.
We can check the summary statistics for the ratings of players in each cluster, and whittle down our field by finding the cluster with the most highly-ranked players.
We need to identify the cluster containing the strongest players for each position.
Ideally: create a new measure of player strength based on attributes
Time constraints: lean on player rankings in the existing dataset
In an attempt to identify the best cluster for a given position, I performed some serious number crunching on various statistical measures of the attributes in each cluster, including:
And after doing that work, I realized that I just needed to look for the largest cluster for a given position--it consistently holds the highest-ranked players.
for position in ['ST','LB','RCM','GK']: #all_positions:
score_distribution(df.loc[(df['Position'] == position) & (df[position] > 0),[position,'cluster']],position)
best_cluster_per_position
{'LS': 2,
'ST': 2,
'RS': 2,
'LW': 2,
'LF': 2,
'CF': 2,
'RF': 2,
'RW': 2,
'LAM': 2,
'CAM': 2,
'RAM': 2,
'LM': 2,
'LCM': 2,
'CM': 2,
'RCM': 0,
'RM': 2,
'LWB': 0,
'LDM': 0,
'CDM': 0,
'RDM': 0,
'RWB': 0,
'LB': 0,
'LCB': 0,
'CB': 0,
'RCB': 0,
'RB': 0,
'GK': 1}
Once we've identified the cluster with the best players for a position, we can compare the players' strengths and salary ranges to determine where our money will be best spent.
for position in all_positions:
score_wage(df.loc[(df[position] > 85) & (df['Wage'] > 0)],position)
I'm aiming for a 4-3-3 formation for my fantasy team (4 defenders, 3 midfielders, 3 forwards)--this is one of the most common formations.
We have data for a lot more than the ten positions this formation requires, so let's group the position rankings we have in ways that make sense.
starting_players = {'Left Forward': {'positions': ['LF','LS','LW'],
'affinity': 'L',},
'Center Forward': {'positions': ['CF','ST',],
'affinity': 'C',
'wage%': 1.5,},
'Right Forward': {'positions': ['RF','RS','RW'],
'affinity': 'R',},
'Left Midfield': {'positions': ['LM','LAM','CAM','LCM',],
'affinity': 'L',},
'Center Midfield': {'positions': ['CM','CAM','LCM','RCM'],
'affinity': 'C',},
'Right Midfield': {'positions': ['RM','RAM','CAM','RCM',],
'affinity': 'R',},
'Left Back': {'positions': ['LB','LWB','LDM','CDM',],
'affinity': 'L',},
'Right Back': {'positions': ['RB','RWB','RDM','CDM',],
'affinity': 'R',},
'Left-Center Back': {'positions': ['LCB','CB','LDM'],
'affinity': 'LC',},
'Right-Center Back': {'positions': ['RCB','CB','RDM'],
'affinity': 'RC',},
'Goalkeeper': {'positions': ['GK'],
'affinity': 'GK',
'wage%': 1.5,},
}
Sanity check: make sure the positions I'm combining are members of the same cluster! If not, there will be problems...
check_position(starting_players,best_cluster_per_position)
Left Forward position list: ['LF', 'LS', 'LW'] Center Forward position list: ['CF', 'ST'] Right Forward position list: ['RF', 'RS', 'RW'] Left Midfield position list: ['LM', 'LAM', 'CAM', 'LCM'] Center Midfield position list: ['CM', 'CAM', 'LCM', 'RCM'] Modified positions for Center Midfield! Was: ['CM', 'CAM', 'LCM', 'RCM']; now: ['CM', 'CAM', 'LCM'] Right Midfield position list: ['RM', 'RAM', 'CAM', 'RCM'] Modified positions for Right Midfield! Was: ['RM', 'RAM', 'CAM', 'RCM']; now: ['RM', 'RAM', 'CAM'] Left Back position list: ['LB', 'LWB', 'LDM', 'CDM'] Right Back position list: ['RB', 'RWB', 'RDM', 'CDM'] Left-Center Back position list: ['LCB', 'CB', 'LDM'] Right-Center Back position list: ['RCB', 'CB', 'RDM'] Goalkeeper position list: ['GK'] Positions are grouped correctly based on clusters
This function grew iteratively, with each major iteration looking something like:
df = best_for_position(df,starting_players,debug=True)
Starter: Left Forward
Name Wage Position affinity_bonus Left Forward rankweight
0 L. Sané 195000 LW 1.5 85.333333 4.779834
118 P. Dybala 215000 CF 1.0 86.333333 2.992934
152 R. Sterling 255000 LW 1.5 86.333333 3.785181
155 Neymar Jr 290000 LW 1.5 90.666667 3.855105
203 M. Salah 240000 RW 1.0 89.666667 3.003875
206 H. Kane 220000 ST 1.0 87.000000 2.993195
244 S. Mané 220000 LW 1.5 89.000000 4.806607
245 G. Bale 250000 RM 1.0 87.000000 2.634012
249 R. Lewandowski 235000 LS 1.5 87.333333 4.251709
267 P. Aubameyang 205000 LM 1.5 86.333333 4.708395
472 E. Cavani 195000 ST 1.0 85.666667 3.224044
539 K. Benzema 285000 CF 1.0 86.333333 2.257827
Name Club Wage Position affinity Left Forward
244 S. Mané Liverpool 220000 LW L 89.0
Starter: Center Forward
wage% is set for Center Forward: 1.5
Name Wage Position affinity_bonus Center Forward \
96 Cristiano Ronaldo 405000 ST 1.5 93.5
118 P. Dybala 215000 CF 1.5 85.5
141 A. Griezmann 370000 CAM 1.5 89.0
155 Neymar Jr 290000 LW 1.0 89.5
203 M. Salah 240000 RW 1.0 89.0
206 H. Kane 220000 ST 1.5 88.0
209 K. De Bruyne 370000 RCM 1.0 87.5
214 H. Son 185000 LS 1.0 87.5
245 G. Bale 250000 RM 1.0 87.0
249 R. Lewandowski 235000 LS 1.0 88.5
267 P. Aubameyang 205000 LM 1.0 86.5
338 L. Suárez 355000 ST 1.5 90.5
444 S. Agüero 300000 ST 1.5 90.0
472 E. Cavani 195000 ST 1.5 87.0
539 K. Benzema 285000 CF 1.5 86.5
rankweight
96 3.027409
118 4.360649
141 2.857982
155 2.472129
203 2.937371
206 4.646400
209 1.810600
214 3.621199
245 2.634012
249 2.949592
267 3.157145
338 3.131905
444 3.645000
472 5.065408
539 3.406393
Name Club Wage Position affinity Center Forward
472 E. Cavani Paris Saint-Germain 195000 ST C 87.0
Starter: Right Forward
Name Wage Position affinity_bonus Right Forward \
0 L. Sané 195000 LW 1.0 85.333333
96 Cristiano Ronaldo 405000 ST 1.0 93.000000
118 P. Dybala 215000 CF 1.0 86.333333
141 A. Griezmann 370000 CAM 1.0 89.333333
152 R. Sterling 255000 LW 1.0 86.333333
155 Neymar Jr 290000 LW 1.0 90.666667
203 M. Salah 240000 RW 1.5 89.666667
206 H. Kane 220000 ST 1.0 87.000000
209 K. De Bruyne 370000 RCM 1.0 88.333333
245 G. Bale 250000 RM 1.5 87.000000
249 R. Lewandowski 235000 LS 1.0 87.333333
267 P. Aubameyang 205000 LM 1.0 86.333333
338 L. Suárez 355000 ST 1.0 89.666667
444 S. Agüero 300000 ST 1.0 89.333333
539 K. Benzema 285000 CF 1.0 86.333333
rankweight
0 3.186556
96 1.986067
118 2.992934
141 1.926810
152 2.523454
155 2.570070
203 4.505812
206 2.993195
209 1.862825
245 3.951018
249 2.834472
267 3.138930
338 2.030789
444 2.376399
539 2.257827
Name Club Wage Position affinity Right Forward
203 M. Salah Liverpool 240000 RW R 89.666667
Starter: Left Midfield
Name Wage Position affinity_bonus Left Midfield \
96 Cristiano Ronaldo 405000 ST 1.0 89.25
118 P. Dybala 215000 CF 1.0 87.25
141 A. Griezmann 370000 CAM 1.0 88.75
152 R. Sterling 255000 LW 1.5 86.25
155 Neymar Jr 290000 LW 1.5 90.75
158 P. Pogba 250000 RDM 1.0 86.75
209 K. De Bruyne 370000 RCM 1.0 90.75
214 H. Son 185000 LS 1.5 86.25
224 C. Eriksen 205000 CAM 1.0 88.50
250 Coutinho 175000 CAM 1.0 86.50
309 Isco 245000 LW 1.5 85.50
316 M. Reus 170000 ST 1.0 87.25
338 L. Suárez 355000 ST 1.0 87.25
355 Roberto Firmino 170000 CF 1.0 86.25
444 S. Agüero 300000 ST 1.0 85.50
446 Thiago 180000 CM 1.0 86.50
450 L. Modrić 340000 RCM 1.0 89.00
495 David Silva 265000 LCM 1.0 87.00
rankweight
96 1.755374
118 3.089284
141 1.889311
152 3.774230
155 3.865745
158 2.611370
209 2.019939
214 5.202317
224 3.381240
250 3.698369
309 3.826692
316 3.907036
338 1.870975
355 3.774230
444 2.083421
446 3.595637
450 2.073438
495 2.484917
Name Club Wage Position affinity Left Midfield
214 H. Son Tottenham Hotspur 185000 LS L 86.25
Starter: Center Midfield
Name Wage Position affinity_bonus Center Midfield \
96 Cristiano Ronaldo 405000 ST 1.5 86.333333
141 A. Griezmann 370000 CAM 1.5 87.333333
155 Neymar Jr 290000 LW 1.0 87.666667
158 P. Pogba 250000 RDM 1.0 87.000000
209 K. De Bruyne 370000 RCM 1.0 90.333333
224 C. Eriksen 205000 CAM 1.5 88.333333
250 Coutinho 175000 CAM 1.5 85.333333
338 L. Suárez 355000 ST 1.5 85.666667
355 Roberto Firmino 170000 CF 1.5 85.666667
446 Thiago 180000 CM 1.5 87.000000
450 L. Modrić 340000 RCM 1.0 89.666667
451 M. Pjanić 180000 CDM 1.5 86.000000
495 David Silva 265000 LCM 1.0 86.666667
519 T. Kroos 330000 LCM 1.0 86.666667
777 I. Rakitić 245000 CM 1.5 85.333333
878 I. Gündoğan 180000 CM 1.5 85.666667
rankweight
96 2.383262
141 2.700410
155 2.323301
158 2.634012
209 1.992243
224 5.043259
250 5.326100
338 2.656431
355 5.547253
446 5.487525
450 2.120382
451 5.300467
495 2.456464
519 1.972615
777 3.804357
878 5.239072
Name Club Wage Position affinity Center Midfield
355 Roberto Firmino Liverpool 170000 CF C 85.666667
Starter: Right Midfield
Name Wage Position affinity_bonus Right Midfield \
39 K. Mbappé 155000 RM 1.5 89.000000
100 O. Dembélé 195000 RW 1.5 85.333333
118 P. Dybala 215000 CF 1.0 88.666667
141 A. Griezmann 370000 CAM 1.0 89.666667
152 R. Sterling 255000 LW 1.0 88.000000
155 Neymar Jr 290000 LW 1.0 92.666667
158 P. Pogba 250000 RDM 1.5 86.666667
206 H. Kane 220000 ST 1.0 85.666667
209 K. De Bruyne 370000 RCM 1.0 91.000000
224 C. Eriksen 205000 CAM 1.0 88.666667
245 G. Bale 250000 RM 1.5 85.333333
250 Coutinho 175000 CAM 1.0 87.333333
309 Isco 245000 LW 1.0 86.000000
316 M. Reus 170000 ST 1.0 88.666667
338 L. Suárez 355000 ST 1.0 88.333333
404 A. Di María 150000 RM 1.5 86.666667
444 S. Agüero 300000 ST 1.0 87.333333
446 Thiago 180000 CM 1.0 86.333333
450 L. Modrić 340000 RCM 1.0 88.666667
495 David Silva 265000 LCM 1.0 87.333333
539 K. Benzema 285000 CF 1.0 86.333333
2485 Xavi 150000 CM 1.0 86.666667
rankweight
39 6.822281
100 4.779834
118 3.242222
141 1.948459
152 2.672439
155 2.743927
158 3.905778
206 2.857676
209 2.036678
224 3.400379
245 3.728270
250 3.806292
309 2.596147
316 4.100457
338 1.941536
404 6.509630
444 2.220337
446 3.574893
450 2.050228
495 2.513589
539 2.257827
2485 4.339753
Name Club Wage Position affinity Right Midfield
39 K. Mbappé Paris Saint-Germain 155000 RM R 89.0
Starter: Left Back
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:37: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:39: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:49: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Name Wage Position affinity_bonus Left Back rankweight
275 N. Kanté 235000 LDM 1.5 88.75 4.461989
492 Sergio Ramos 300000 RCB 1.0 85.50 2.083421
583 A. Vidal 205000 LCM 1.0 85.75 3.075732
692 Jordi Alba 240000 LB 1.5 85.50 3.906415
725 Carvajal 205000 RB 1.0 85.50 3.048909
3433 P. Lahm 140000 RB 1.0 87.75 4.826289
Name Club Wage Position affinity Left Back
3433 P. Lahm FC Bayern München 140000 RB R 87.75
Starter: Right Back
Name Wage Position affinity_bonus Right Back rankweight
275 N. Kanté 235000 LDM 1.0 88.75 2.974659
492 Sergio Ramos 300000 RCB 1.0 85.50 2.083421
583 A. Vidal 205000 LCM 1.0 85.75 3.075732
692 Jordi Alba 240000 LB 1.0 85.50 2.604277
725 Carvajal 205000 RB 1.5 85.50 4.573364
Name Club Wage Position affinity Right Back
725 Carvajal Real Madrid 205000 RB R 85.5
Starter: Left-Center Back
Name Wage Position affinity_bonus Left-Center Back \
166 V. van Dijk 200000 LCB 1.5 88.666667
228 A. Laporte 195000 CB 1.0 86.000000
238 S. Umtiti 210000 LCB 1.5 85.666667
275 N. Kanté 235000 LDM 1.0 87.333333
305 K. Koulibaly 150000 LCB 1.5 86.666667
365 Fernandinho 200000 LCB 1.5 85.666667
492 Sergio Ramos 300000 RCB 1.0 89.000000
576 Casemiro 240000 CDM 1.0 87.000000
583 A. Vidal 205000 LCM 1.5 87.000000
616 Sergio Busquets 300000 CDM 1.0 86.333333
617 T. Alderweireld 155000 RCB 1.0 86.666667
783 G. Chiellini 215000 LCB 1.5 86.000000
815 D. Godín 135000 RCB 1.0 87.000000
853 Piqué 285000 RCB 1.0 87.333333
869 Thiago Silva 135000 LCB 1.5 86.333333
894 J. Vertonghen 155000 LCB 1.5 86.333333
1055 L. Bonucci 160000 RCB 1.0 86.333333
rankweight
166 5.228082
228 3.261826
238 4.490633
275 2.834472
305 6.509630
365 4.715165
492 2.349897
576 2.743762
583 4.818315
616 2.144936
617 4.199761
783 4.437600
815 4.877800
853 2.337197
869 7.149786
894 6.227233
1055 4.021754
Name Club Wage Position affinity \
869 Thiago Silva Paris Saint-Germain 135000 LCB LC
Left-Center Back
869 86.333333
Starter: Right-Center Back
Name Wage Position affinity_bonus Right-Center Back \
166 V. van Dijk 200000 LCB 1.0 88.666667
228 A. Laporte 195000 CB 1.0 86.000000
238 S. Umtiti 210000 LCB 1.0 85.666667
275 N. Kanté 235000 LDM 1.0 87.333333
305 K. Koulibaly 150000 LCB 1.0 86.666667
365 Fernandinho 200000 LCB 1.0 85.666667
492 Sergio Ramos 300000 RCB 1.5 89.000000
576 Casemiro 240000 CDM 1.0 87.000000
583 A. Vidal 205000 LCM 1.0 87.000000
616 Sergio Busquets 300000 CDM 1.0 86.333333
617 T. Alderweireld 155000 RCB 1.5 86.666667
783 G. Chiellini 215000 LCB 1.0 86.000000
815 D. Godín 135000 RCB 1.5 87.000000
853 Piqué 285000 RCB 1.5 87.333333
894 J. Vertonghen 155000 LCB 1.0 86.333333
1055 L. Bonucci 160000 RCB 1.5 86.333333
rankweight
166 3.485388
228 3.261826
238 2.993755
275 2.834472
305 4.339753
365 3.143443
492 3.524845
576 2.743762
583 3.212210
616 2.144936
617 6.299642
783 2.958400
815 7.316700
853 3.505795
894 4.151488
1055 6.032632
Name Club Wage Position affinity Right-Center Back
815 D. Godín Inter 135000 RCB RC 87.0
Starter: Goalkeeper
wage% is set for Goalkeeper: 1.5
Name Wage Position affinity_bonus Goalkeeper rankweight
345 T. Courtois 235000 GK 1 88.0 2.899881
377 Alisson 155000 GK 1 89.0 4.548187
411 J. Oblak 125000 GK 1 91.0 6.028568
471 Ederson 185000 GK 1 88.0 3.683632
766 M. Neuer 155000 GK 1 88.0 4.396594
1142 H. Lloris 150000 GK 1 88.0 4.543147
Name Club Wage Position affinity Goalkeeper
411 J. Oblak Atlético Madrid 125000 GK G 91.0
# This renders poorly in the notebook...sorry.
from IPython.display import HTML
HTML(filename='starting_players.html')
Left Forward
S. Mané 27, Senegal Liverpool Rank: 89.0 | Center Forward
E. Cavani 32, Uruguay Paris Saint-Germain Rank: 87.0 | Right Forward
M. Salah 27, Egypt Liverpool Rank: 89.7 |
Left Midfield
H. Son 26, Korea Republic Tottenham Hotspur Rank: 86.2 | Center Midfield
Roberto Firmino 27, Brazil Liverpool Rank: 85.7 | Right Midfield
K. Mbappé 20, France Paris Saint-Germain Rank: 89.0 |
Left Back
P. Lahm 32, Germany FC Bayern München Rank: 87.8 | Left-Center Back
Thiago Silva 34, Brazil Paris Saint-Germain Rank: 86.3 | Right-Center Back
D. Godín 33, Uruguay Inter Rank: 87.0 | Right Back
Carvajal 27, Spain Real Madrid Rank: 85.5 |
Goalkeeper
J. Oblak 26, Slovenia Atlético Madrid Rank: 91.0 |
NOTE: I didn't finish implementing a salary cap.
Example: Sadio Mané
budget = 10000000
spent = 0
for starter in starting_players:
player_wage = df.loc[df['ID'] == starting_players[starter]['ID'],'Wage'].to_string(index=False)
player_wage = int(player_wage)
print("{}:\t{}, {}".format(starter,starting_players[starter]['Name'],player_wage))
spent += player_wage
print('Spent: {}; Remaining budget: {}'.format(spent,budget-spent))
Left Forward: S. Mané, 220000 Center Forward: E. Cavani, 195000 Right Forward: M. Salah, 240000 Left Midfield: H. Son, 185000 Center Midfield: Roberto Firmino, 170000 Right Midfield: K. Mbappé, 155000 Left Back: P. Lahm, 140000 Right Back: Carvajal, 205000 Left-Center Back: Thiago Silva, 135000 Right-Center Back: D. Godín, 135000 Goalkeeper: J. Oblak, 125000 Spent: 1905000; Remaining budget: 8095000